Bottom-Up Learning of Phonemes: A Computational Study

نویسنده

  • Rozenn Le Calvez
چکیده

We present a computational evaluation of a hypothesis according to which distributional information is su cient to acquire allophonic rules (and hence phonemes) in a bottom-up fashion. The hypothesis was tested using a measure based on information theory that compares distributions. The test was conducted on several arti cial language corpora and on two natural corpora containing transcriptions of speech directed to infants from two typologically distant languages (French and Japanese). The measure was complemented with three lters, one concerning the statistical reliability due to sample size and two concerning the following universal properties of allophonic rules: constituents of an allophonic rule should be phonetically similar, and allophonic rules should be assimilatory in nature. Acquisition of Allophonic Rules During their rst year of life, infants learn many aspects of the phonology of their native language. At birth, they all discriminate speech segments (atomic units corresponding to consonants and vowels) in a languageuniversal way. Their perception then becomes attuned to their native language: at 6-8 months (Kuhl et al. 1992), infants learn the vowel categories of their native language and at 10-12 months the consonant categories (Werker & Tees 1984). These remarkable steps are reached before infants have a lexicon and before they can talk. Infants' learning mechanisms include extracting statistical regularities present in the speech signal such as frequency distributions of segments and transitional probabilities between segments (Jusczyk 1997; Maye, Werker & Gerken 2002). One aspect of early phonological acquisition that remains to be studied is how children go beyond the segmental representation and acquire the phonemes of their language. Phonemes and Allophonic Rules Languages represent speech sounds at two levels. At the abstract (underlying) level, word forms are made up of a combination of a nite set of phonemes. At the surface level, the pronunciation of a word is speci ed in terms of a larger set of context-dependent segments. For instance in Mexican Spanish, the word /felis/ ( feliz , happy) is pronounced as [feliz] when it is followed by a voiced consonant and [felis] otherwise. Allophonic rules express the phonetic realizations of a given phoneme according to its context. For instance, the allophonic rule of voicing in Mexican Spanish is written as follows: /s/→ { [z] before voiced consonants [s] elsewhere (1) [z] is called the allophone and [s] the default segment. These two segments never occur in the same contexts: they are in complementary distributions . Whether a given pair of segments is in an allophonic relationship or not is language-speci c: unlike in Mexican Spanish, /s/ and /z/ are two distinct phonemes in French so that /felis/ and /feliz/ could be two di erent French words. Therefore, phonemes (and allophonic rules) must be learned at some point in the course of language acquisition. A Bottom-Up Hypothesis When and how phonemes are learned is controversial. Phonemes might be learned in a top-down fashion with the help of the lexicon or the orthography: knowing the abstract form of a word, children would learn to match phonemes with their phonetic realizations (Kazanina, Phillips & Idsardi 2006). Alternatively, phonemes might be learned very early in life before infants have a lexicon, based on complementary distributions of segments (Peperkamp & Dupoux 2002). This is the hypothesis we endorse here. A Computational Evaluation We present a computational study of the bottom-up learning of phonemes. Our approach is that a signi cant number of allophonic pairs can be acquired without the help of lexical information. We suppose that, prior to this acquisition, infants can extract phonetic segments from the speech they hear, use statistical learning mechanisms, and that they have a similarity metrics allowing them to compare the segments of their native language (Liberman & Mattingly 1985). We investigate to what extent statistical information is su cient or to what extent other information (in the form of linguistic biases) might be necessary.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Rudimentary Lexicon and Semantics Help Bootstrap Phoneme Acquisition

Infants spontaneously discover the relevant phonemes of their language without any direct supervision. This acquisition is puzzling because it seems to require the availability of high levels of linguistic structures (lexicon, semantics), that logically suppose the infants having a set of phonemes already. We show how this circularity can be broken by testing, in realsize language corpora, a sc...

متن کامل

Phonetic-phonological feature emerges by associating phonetic with semantic information - a GSOM-based modeling study

Born with an innate neural architecture built specially for language learning, young children have the ability to distinguish sounds in a variety of languages. As they are exposed to native language environment, perceptual reorganization occurs, and native language system gradually establishes. Phonology knowledge, which is language-specific, emerges during this process. In this study, based on...

متن کامل

Knowing a word affects the fundamental perception of the sounds within it.

Understanding spoken language is an exceptional computational achievement of the human cognitive apparatus. Theories of how humans recognize spoken words fall into two categories: Some theories assume a fully bottom-up flow of information, in which successively more abstract representations are computed. Other theories, in contrast, assert that activation of a more abstract representation (e.g....

متن کامل

Intelligent multi-agent modeling of the interbank network and evaluation of the impact of regulatory policies

agent-based modeling is an emerging computational technique that makes it possible to simulate complex economic systems, including the banking network, with a bottom-up approach. In this paper, the country's banking network is simulated with an intelligent multi-agent modeling model and indicates that these agents behave based on the adaptive learning. This modeling has been done with the aim o...

متن کامل

Effects of sound pillow in the treatment of stuttering and cognitive phonemes impairment in children

Introduction:Verbal language is Fundamental component for expressing ideas, social interaction and understanding educational materials. Effective communications require verbal language skills. Sound pillows may partly address the children with behavior problems. The purpose of this study was assessing the effect of educational sound pillow in the treatment of stuttering and cognitive phonemes i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007